auxiliary input
Incorporating Side Information by Adaptive Convolution
Di Kang, Debarun Dhar, Antoni Chan
Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in counting systems based on deep learning.
- Asia > China > Hong Kong (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Incorporating Side Information by Adaptive Convolution
Di Kang, Debarun Dhar, Antoni Chan
Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in counting systems based on deep learning. In order to incorporate the available side information, we propose an adaptive convolutional neural network (ACNN), where the convolution filter weights adapt to the current scene context via the side information.
- Asia > China > Hong Kong (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Exploring speech style spaces with language models: Emotional TTS without emotion labels
Chandra, Shreeram Suresh, Du, Zongyang, Sisman, Berrak
Many frameworks for emotional text-to-speech (E-TTS) rely on human-annotated emotion labels that are often inaccurate and difficult to obtain. Learning emotional prosody implicitly presents a tough challenge due to the subjective nature of emotions. In this study, we propose a novel approach that leverages text awareness to acquire emotional styles without the need for explicit emotion labels or text prompts. We present TEMOTTS, a two-stage framework for E-TTS that is trained without emotion labels and is capable of inference without auxiliary inputs. Our proposed method performs knowledge transfer between the linguistic space learned by BERT and the emotional style space constructed by global style tokens. Our experimental results demonstrate the effectiveness of our proposed framework, showcasing improvements in emotional accuracy and naturalness. This is one of the first studies to leverage the emotional correlation between spoken content and expressive delivery for emotional TTS.
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- Asia (0.04)
HI-GAN: Hierarchical Inpainting GAN with Auxiliary Inputs for Combined RGB and Depth Inpainting
Dash, Ankan, Gu, Jingyi, Wang, Guiling
Inpainting involves filling in missing pixels or areas in an image, a crucial technique employed in Mixed Reality environments for various applications, particularly in Diminished Reality (DR) where content is removed from a user's visual environment. Existing methods rely on digital replacement techniques which necessitate multiple cameras and incur high costs. AR devices and smartphones use ToF depth sensors to capture scene depth maps aligned with RGB images. Despite speed and affordability, ToF cameras create imperfect depth maps with missing pixels. To address the above challenges, we propose Hierarchical Inpainting GAN (HI-GAN), a novel approach comprising three GANs in a hierarchical fashion for RGBD inpainting. EdgeGAN and LabelGAN inpaint masked edge and segmentation label images respectively, while CombinedRGBD-GAN combines their latent representation outputs and performs RGB and Depth inpainting. Edge images and particularly segmentation label images as auxiliary inputs significantly enhance inpainting performance by complementary context and hierarchical optimization. We believe we make the first attempt to incorporate label images into inpainting process.Unlike previous approaches requiring multiple sequential models and separate outputs, our work operates in an end-to-end manner, training all three models simultaneously and hierarchically. Specifically, EdgeGAN and LabelGAN are first optimized separately and further optimized inside CombinedRGBD-GAN to enhance inpainting quality. Experiments demonstrate that HI-GAN works seamlessly and achieves overall superior performance compared with existing approaches.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey (0.04)
Generating Personas for Games with Multimodal Adversarial Imitation Learning
Ahlberg, William, Sestini, Alessandro, Tollmar, Konrad, Gisslén, Linus
Reinforcement learning has been widely successful in producing agents capable of playing games at a human level. However, this requires complex reward engineering, and the agent's resulting policy is often unpredictable. Going beyond reinforcement learning is necessary to model a wide range of human playstyles, which can be difficult to represent with a reward function. This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting. Multimodal Generative Adversarial Imitation Learning (MultiGAIL) uses an auxiliary input parameter to learn distinct personas using a single-agent model. MultiGAIL is based on generative adversarial imitation learning and uses multiple discriminators as reward models, inferring the environment reward by comparing the agent and distinct expert policies. The reward from each discriminator is weighted according to the auxiliary input. Our experimental analysis demonstrates the effectiveness of our technique in two environments with continuous and discrete action spaces.
Agent-State Construction with Auxiliary Inputs
Tao, Ruo Yu, White, Adam, Machado, Marlos C.
In many, if not every realistic sequential decision-making task, the decision-making agent is not able to model the full complexity of the world. The environment is often much larger and more complex than the agent, a setting also known as partial observability. In such settings, the agent must leverage more than just the current sensory inputs; it must construct an agent state that summarizes previous interactions with the world. Currently, a popular approach for tackling this problem is to learn the agent-state function via a recurrent network from the agent's sensory stream as input. Many impressive reinforcement learning applications have instead relied on environment-specific functions to aid the agent's inputs for history summarization. These augmentations are done in multiple ways, from simple approaches like concatenating observations to more complex ones such as uncertainty estimates. Although ubiquitous in the field, these additional inputs, which we term auxiliary inputs, are rarely emphasized, and it is not clear what their role or impact is. In this work we explore this idea further, and relate these auxiliary inputs to prior classic approaches to state construction. We present a series of examples illustrating the different ways of using auxiliary inputs for reinforcement learning. We show that these auxiliary inputs can be used to discriminate between observations that would otherwise be aliased, leading to more expressive features that smoothly interpolate between different states. Finally, we show that this approach is complementary to state-of-the-art methods such as recurrent neural networks and truncated back-propagation through time, and acts as a heuristic that facilitates longer temporal credit assignment, leading to better performance.
- North America > Canada > Alberta (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
Reinforcement learning improves game testing, AI team finds
Learn more about what comes next. As game worlds grow more vast and complex, making sure they are playable and bug-free is becoming increasingly difficult for developers. And gaming companies are looking for new tools, including artificial intelligence, to help overcome the mounting challenge of testing their products. A new paper by a group of AI researchers at Electronic Arts shows that deep reinforcement learning agents can help test games and make sure they are balanced and solvable. "Adversarial Reinforcement Learning for Procedural Content Generation," the technique presented by the EA researchers, is a novel approach that addresses some of the shortcomings of previous AI methods for testing games.
Reinforcement learning improves game testing, EA's AI team finds
This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence. As game worlds grow more vast and complex, making sure they are playable and bug-free is becoming increasingly difficult for developers. And gaming companies are looking for new tools, including artificial intelligence, to help overcome the mounting challenge of testing their products. A new paper by a group of AI researchers at Electronic Arts shows that deep reinforcement learning agents can help test games and make sure they are balanced and solvable. "Adversarial Reinforcement Learning for Procedural Content Generation," the technique presented by the EA researchers, is a novel approach that addresses some of the shortcomings of previous AI methods for testing games.
Adversarial Reinforcement Learning for Procedural Content Generation
Gisslén, Linus, Eakins, Andy, Gordillo, Camilo, Bergdahl, Joakim, Tollmar, Konrad
We present an approach for procedural content generation (PCG), and improving generalization in reinforcement learning (RL) agents, by using adversarial deep RL. Training RL agents for generalization over novel environments is a notoriously difficult task. One popular approach is to procedurally generate different environments to increase the generalizability of the trained agents. Here we deploy an adversarial model with one PCG RL agent (called Generator), and one solving RL agent (called Solver). The benefit is mainly two-fold: Firstly, the Solver achieves better generalization through the generated challenges from the Generator. Secondly, the trained Generator can be used as a creator of novel environments that, together with the Solver, can be shown to be solvable. The Generator receives a reward signal based on the performance of the Solver which encourages the environment design to be challenging but not impossible. To further drive diversity and control of the environment generation, we propose the use of auxiliary inputs for the Generator. Thus, we propose adversarial RL for procedural content generation (ARLPCG), an adversarial approach which procedurally generates previously unseen environments with an auxiliary input as a control variable. Herein we describe this concept in detail and compare it with previous methods showing improved generalization, as well as a new method to create novel environments.
- Leisure & Entertainment > Games > Computer Games (1.00)
- Information Technology (0.68)
MC-LSTM: Mass-Conserving LSTM
Hoedt, Pieter-Jan, Kratzert, Frederik, Klotz, Daniel, Halmich, Christina, Holzleitner, Markus, Nearing, Grey, Hochreiter, Sepp, Klambauer, Günter
The success of Convolutional Neural Networks (CNNs) in computer vision is mainly driven by their strong inductive bias, which is strong enough to allow CNNs to solve vision-related tasks with random weights, meaning without learning. Similarly, Long Short-Term Memory (LSTM) has a strong inductive bias towards storing information over time. However, many real-world systems are governed by conservation laws, which lead to the redistribution of particular quantities -- e.g. in physical and economical systems. Our novel Mass-Conserving LSTM (MC-LSTM) adheres to these conservation laws by extending the inductive bias of LSTM to model the redistribution of those stored quantities. MC-LSTMs set a new state-of-the-art for neural arithmetic units at learning arithmetic operations, such as addition tasks, which have a strong conservation law, as the sum is constant over time. Further, MC-LSTM is applied to traffic forecasting, modelling a pendulum, and a large benchmark dataset in hydrology, where it sets a new state-of-the-art for predicting peak flows. In the hydrology example, we show that MC-LSTM states correlate with real-world processes and are therefore interpretable.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- (8 more...)
- Energy (0.92)
- Government > Regional Government > North America Government > United States Government (0.45)
- Education > Curriculum > Subject-Specific Education (0.34)